On-Chip Network Designs for Many-Core Computational Platforms
نویسنده
چکیده
Processor designers have been utilizing more processing elements (PEs) on a single chip to make efficient use of technology scaling and also to speed up system performance through increased parallelism. Networks on-chip (NoCs) have been shown to be promising for scalable interconnection of large numbers of PEs in comparison to structures such as point-to-point interconnects or global buses. This dissertation investigates the designs of on-chip interconnection networks for many-core computational platforms in three application domains: high-performance network designs for applications with high communication bandwidths; low-cost networks for applicationspecific low-bandwidth dynamic traffic; and reconfigurable networks for platforms targeting digital signal processing (DSP) applications which have deterministic inter-task communication characteristics. An on-chip router architecture named RoShaQ is proposed for platforms executing generalpurpose applications with dynamic and high communication bandwidths. RoShaQ maximizes buffer utilization by allowing sharing of multiple buffer queues among input ports hence achieves high network performance. Experimental results show that RoShaQ is 17.2% lower latency, 18.2% higher saturation throughput and 8.3% lower energy dissipated per bit than state-of-the-art virtualchannel routers given the same buffer capacity averaged over a broad range of traffic patterns. For mapping applications showing low inter-task communication bandwidths, five lowcost bufferless routers are proposed. All routers guarantee in-order packet delivery so that expensive reordering buffers are not required. The proposed bufferless routers have lower costs and higher performance per unit cost than all buffered wormhole routers — the smallest proposed bufferless router has 32.4% less area, 24.5% higher throughput, 29.5% lower latency, 10.0% lower power and 26.5% lower energy per bit than the smallest buffered router. A globally asynchronous locally synchronous (GALS)-compatible reconfigurable circuitswitched on-chip network is proposed for use in many-core platforms targeting streaming DSP and embedded applications which show deterministic inter-task communication traffic. Inter-processor communication is achieved through a simple yet effective source-synchronous technique which can sustain the ideal throughput of one word per cycle and the ideal latency approaching the wire delay. This network was utilized in a GALS many-core chip fabricated in 65 nm CMOS. For evaluating
منابع مشابه
Sustainable Computing: Informatics and Systems
Several emerging application domains in scientific computing demand high computation throughputs to achieve terascale or higher performance. Dedicated centers hosting scientific computing tools on a few high-end servers could rely on hardware accelerator co-processors that contain multiple lightweight custom cores interconnected through an on-chip network. With increasing workloads, these many-...
متن کاملTiming analysis of network on chip architectures for MP-SoC platforms
Recently, the use of multiprocessor system-on-chip (MP-SoC) platforms has emerged as an important integrated circuit design trend for high-performance computing applications. As the number of reusable intellectual property (IP) blocks on such platforms continues to increase, many have argued that monolithic bus-based interconnect architectures will not be able to support the clock cycle require...
متن کاملDesign of a Low-Latency Router Based on Virtual Output Queuing and Bypass Channels for Wireless Network-on-Chip
Wireless network-on-chip (WiNoC) is considered as a novel approach for designing future multi-core systems. In WiNoCs, wireless routers (WRs) utilize high-bandwidth wireless links to reduce the transmission delay between the long distance nodes. When the network traffic loads increase, a large number of packets will be sent into the wired and wireless links and can...
متن کاملPartitioning the Network-on-Chip to Enable Virtualization on Many-Core Processors
Technological advances have increased the transistor density, thereby ushering in multiand more recently many-core systems, distinguished by the presence of hundreds of cores on a single chip. For such a platform, the Network-on-Chip (NoC) has emerged as a scalable and efficient interconnect fabric to realize the communication across an ever increasing number of processor cores, memories, and s...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کامل